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ABSTRACT 

The empirical Type I error rates of Poly-DIMTEST (H. Li and 
W. Stout, 1995) and the LISREL8 chi square fit statistic (K. Joreskog and D. 
Sorbom, 1993) were compared with polytomous unidimensional data sets 
simulated to vary as a function of test length and sample size. The rejection 
rates for both statistics were also studied with two-dimensional c.ata sets 
simulated to vary as a function of test length, sample size, and degree of 
correlation between latent traits. Severely inflated Type I error rates were 
obtained with the LISREL8 chi square statistic in all conditions, with the 
exception of the 10-item data sets simulated to contain 500 and 1,000 
simulees. Poly-DIMTEST T-empirical Type I error probabilities were at o 
nominal values for. the three sample sizes examined. In addition, the 
performance of the latter statistic was unaffected by the manipulation of 
sample sizJ. Rejection rates using the LISREL8 chi square fit statistic were 
high across all simulated two-dimensional conditions, although results were 
encouraging for 10-item data sets containing 500 or 1,000 “ 

appeared that neither procedure worked well with samples of less than 500 

examinees. Results do suggest that with samples of 500 “^nality 

LISREL8 chi square statistic can be useful for assessment of dimensionality, 
but the Poly-DIMTEST T-statistic lacks the power needed to use with samples 
of fewer than 20 items. (Contains 3 tables and 38 references.) (SLD) 
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Assessing the Dimensionality of Polytomous Item Responses with Small Sample Sizes and Short Test Lengths: 

A Comparison of Procedures. 

' The assumption of unidimensionality is central to item response theory (IRT). Common 1RT models 

assume that the probability of a cotrect response to a given item can be modeled as a function of a single person 
parameter ( 0 ), usually interpreted as the proficiency underlying the item response matrix (Hambleton & 
Swaminathan, 1985). In practice this assumption is rareiy met, given that an item response will often be dependent 
not only upon the hypothesized ability but also on several ancillary' proficiencies. 

A considerable body of research has been dedicated, over the past fifteen years, to developing indices and 
statistics to assess the underlying dimensionality of item response matrices (c.f. De Champlain & Gessaroh, in 
press, for a review). At present, indices and statistics based on Stout's concepts of essential independence and 
essential dimensionality (DIMTEST) have been shown to be useful for assessing the dimenstonality of 
dichotomously-scored responses in several conditions (Hattie, Krakowski, Rogers. & Swaminathan, 1996; 
Nandakumar, 1991 ; 1994; Nandakumar & Stout, 1993; Stout, 1987; 1990). Similarly, the use of fit indices and 
statistics based on a nonlinear factor analysis (NLFA) of an item response matrix to assess the dimensionality of a 
given data set has also proven to be helpful with binary item responses (De Champlain. 1996; De Champlain & 
Tang. 1997; Gessaroli, 1994; Gessaroli and De Champlain. 1996; Hattie. 1984; 1985; McDonald & Mok, 1995). 

In addition. De Champlain and Gessaroli ( 1 997) have shown that factor analytic models implemented in common 
software packages (e.g. PRELIS2/LISREL8 (JOreskog & Sdrbom. 1993a; 1993b)) are promising with respect to 

assessing the dimensionality of dichotomously-scored items. 

■to, body of research focusing on the assessment of dimensionality for polymmonsly scored items is, 

however, more spam. (De Ayala, 1094; ,995). Da Ayala (1994; 1995) has shown that the accimmy with which 
item and ability parametem can be estimated for both the graded-response and partial credit models is tpiestionable 
in certain multidimensional conditions that were examined. In light of the case-specificity problem pervasive with 
performance assessments (Linn * Burton, ,994; Shavelson. Baxter, * Gao, ,993). idendtylng the nature .(the 
composite would seem to be of the utmost importance. 
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In response to this issue, Li & Stout (1994; 1995) proposed a polytomous extension of their D1MTEST 
procedure (Poly-DIMTEST). The T-statistic, computed within the Poly-DIMTEST software, appeared to maintain 
low Type I error rates (close to the nominal value) with simulated unidimensional data sets. However, the power of 
the statistic in correctly rejecting unidimensionality with simulated two-dimensional data sets was low with small 
sample sizes (less than 1000) and short tests (less than 25 items). NLFA-based procedures and accompanying fit 
statistics have also been proposed for unidimensional and multidimensional polytomous item response models 
(Bartholomew, 1983; Christoffersson, & GunsjO, 1996; Jdreskog, 1994; Muthdn, 1984). 

Although promising, little research has been undertaken to examine the behavior of these polytomous 
dimensionality assessment procedures in more realistic testing conditions. In particular, few investigations have 
focused on examining the Type 1 error rates and power of these statistics with small sample sizes and short test 
lengths. A large number of performance assessments are composed of very few items or tasks. Portfolios often 
contain no more than a dozen scoreable sections (Moss, Beck, Ebbs, Matson, Muchmore, Steele, & Taylor, 1992; 
Nystrand, Cohen, & Dowling, 1993). Similarly, performance assessments that are being considered for inclusion 
into the United States Medical Licensure Examination contain less than 20 scored tasks (Ctauser, Subhiyah, 
Nungester, Ripkey, Clyman, & McKinley, 1995; De Champlain & Klass, 1997). A study examining the behavior of 
polytomous dimensionality assessment procedures with small sample sizes and short test lengths might therefore 
yield beneficial information for performance assessments administered within a variety of contexts for national 
as well as local examinations. 

Purpose 

The two primary objectives of this investigation were as follows: 

To estimate and compare the empirical Type 1 error rates of Poly-DIMTEST (Li & Stout, 1995) and the 
LISREL8 chi-square fit statistic (JOrcskog & SOrbom, 1993a; 1993b) with polytomous unidimensional data 
sets simulated to vary as a function of test length and sample size; 

To examine the rejection rates for both statistics with two-dimensional data sets simulated to vary as a 
function of test length, sample size, and degree of con-elation between latent traits. 

_ 4 
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Methods 

Unidimensional conditions 

In the first part of this investigation, the empirical Type I error rates of both statistics were examined under 
various conditions. Unidimensional polytomous item response vectors were simulated using the generalized partial- 
credit IRT model (Muraki, 1992; 1997) which states that the probability of reaching a particular score category k 

(denoted P Jk ) on item j is given by 



V 8 >= 



exp[I>, v ( 0 )] 

v«Q 

n ~r * 

E expfE r.(9)] 

a .A ucA ^ 



( 1 ) 



with 

v°/ e 'V’ 

and where 

a } « the item discrimination parameter for item;; 

b jk = the threshold (or step) parameter for item; and category k 

0 = the proficiency estimate. 

Note that b Jk can be further decomposed additively into two parts: 

bj =* the item location parameter, i.e., the overall difficulty of item;; 

d v = the relative difficulty of step v in comparison to other steps within item;. 

In other words, the probability that a randomly selected simulee of ability level 6 has of caching score 
category k rather than k-1 can be estimated as a function of how well the item discriminates between test takers of 
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varying ability as well as the difficulty level of the item and the difficulty associated with reaching a given step in 
comparison to other steps. 

In addition, the unidimensional polytomous data sets were generated according to three sample sizes (250, 

500 and 1000 simulees) as well as two test lengths (10 and 20 items). 

In order to simulate realistic item responses, parameters for the data generation were selected from 
PARSCALE (Muraki & Bock, 1993) estimates obtained from a nationally administered standardized patient 
examination (SPX). These are presented in Table 1 . 

Insert Table 1 about here 



The 20- item data sets were composed of ftvo 10-item tests, i.e., the parameters used to simulated responses 
to items 1-10 were identical to those employed to generate responses to items 11-20. Also, note that the response 
variable contained five levels for items one through three whereas it included four levels for items four through 10. 
Proficiencies were randomly generated from a Af(0,l) distribution. Each cell of this 2 x 3 design (test length x 
sample size) was replicated 100 times for a total of 600 unidimensional data sets. 

Two-dimensional conditions 

In the second part of this investigation, two-dimensional polytomous item response vectors were simulated 
using a multidimensional extension of the generalized partial credit model (Muraki, 1992; 1997) given by 




( 2 ) 
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,*(0)=f: 



and where 

= a slope parameter of item j and the /w-th (/n* 3 l,2,...A</) latent trait dimension, 
c, k = an intercept parameter for item./ and category k (fc^l,2,...J*0; 

0 * a proficiency vector. 

These two-dimensional item response vectors were also simulated according to the same two sample sizes 
and two test lengths outlined in the previous section of the proposal as well as according to: 



dimension dominance: 



50% of the items required knowledge of 0, only and the remaining 
items required knowledge of 0%. 



and 



Inicr-proficiency correlation : 0.0, 0.3, and 0.6. 



The parameters previously outlined with the unidimensional conditions were utilized in the two 
dimensional simulations. As suggested by Muraki (1997), the intercept parameters corresponding to the threshold 
values outlined in Table 1 were obtained using the following formula. 



c =-a b . 

jk i jk 



( 3 ) 



where c*. a, and b )k have been previously defined. Finally, proficiencies were randomly generated from a N(0,l) 
distribution. Each cell of this 2 x 3 x 3 design (test length by sample size by level of intcr-prof.ciency correlation) 
was replicated 1 00 times for a total of 1 800 two-dimensional data sets. 
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Analyses 

Poly-DIMTEST was run and the powerful T-statistic (Nandakumar & Stout, 1993) was computed for the 
unidimcnsional and two-dimensional data sets using all default options. Given that the general D1MTEST 
procedure is well known, the reader is referred to other sources for computational details (Nandakumar & Stout, 
1993; Nandakumar, Yu. Li, & Stout, 1995). Based on the results reported by Nandakumar, Yu, Li, and Stout 
(1995), which suggest that using simple Pearson correlations in lieu of the more theoretically appropriate polychoric 
correlations yields very similar Type I and rejection rates for the Poly-DIMTEST T-statistic, we decided, for 
simplicity’s sake, to fit a linear factor analytic model to Pearson item correlations to select items for inclusion into 

the ATI subtest. 

The asymptotic covariance matrix of the polychoric correlations was estimated for all data sets using 
PRELIS2 (JOreskog & SOrbom, 1993a). PREL.IS2/LISREL8 (Jdreskog & Sbrbom. 1993a; 1993b) is a 
comprehensive structural equation modeling (SEM) package which allows the user to fit a confimatory factor 
analytic model to a polytomous item response matrix via several estimation procedures. It is therefore possible to 
assess the fit of a one-factor (i.e.. unidimensional) model to a data set prior to calibrating the item responses using 
an IRT model. Regardless of the procedure specified, the parameters of factor analytic models in L1SREL are 
estimated so as to minimize the following fit function: 

F=(s-o)' FT Vo), (4) 



where 

s - Sample item covariance matrix; 

a = Reproduced covariance matrix from the model parameters; 

\y m A weight matrix referred to as the correct weight matrix. 

With polytomous responses, a usually corresponds to sample estimates of the threshold and polychoric correlations; 
a contains the reproduced threshold and polychoric correlation values and W is a consistent estimator of the 
asymptotic covariance matrix of j. * 
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A chi-square goodness-of-fit statistic, provided in LISREL8 to aid in assessing model fit, is given by 

X 2 =(#-!)* Min(F ), 



( 5 ) 



where. N corresponds to the number of simulces in the sample and Min (F) is the minimum value of the fit function 
given in equation (4) for a specific model. This statistic is distributed asymptotically as a chi-square distribution 

with degrees of freedom equal to 

•5(p)*(p + l) - 1, 

where p is equal to the number of items and / is the number of independent parameters estimated in the model. Chi- 
square statistic values were thus computed for all simulated unidimensional and two-dimensional data sets. 

Regarding unidimcnsional data sets, a logit-linear analysis was undertaken to model the effects of test 
length and sample size as well as the interaction of both variables with respect to decision accuracy (i.e.. the number 
of times the assumption of unidimcnsionality was accepted and rejected (Type 1 error)). For two-dimensional data 
sets, the effects of test length, sample size, degree of inter-proficiency correlation and the various interaction terms 
of the latter factors with respect to decision accuracy were also estimated via a logit-linear analysis. The iogit-lincar 
analyses were undertaken in a forward hierarchical fashion starting with the simplest main effect and progressing 
towards incrementally more complex models while adhering to the rule that higher-order effects are included in the 
model solely if the corresponding lower-order effects are also included. A model was deemed acceptable if its 
corresponding p-value exceeded 0. 15. Effects with z-values greater than 2.00 were treated as stattsttcally 
significant. For the sake of simplicity, significant associations in the logit-linear analyses are discussed only in 
light of the independent variable(s). For example, a significant decision accuracy by sample size by test length 
association in the logit-linear model would be referred to as the effect of sample size by test length. Finally, logit- 
linear analyses were undertaken separately for each statistic. 
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Unidimensional conditions 

The number of rejections of the assumption of unidimcnsionality, based on the Poly-DIMTEST T-statistic and 
LISREL8 x : statistic, are shown for all simulated unidimensional conditions in Table 2. Due to software 
restrictions, it was not possible to compute ^-statistics in conditions that contained only 10 items. Statistics were 
thus estimated solely for the 20-item data sets. 



Insert Table 2 about here 



A nominal Type I em>r probability value of .05 was selected for all analyses. Empirical Type 1 error rates ranged 
from 0.03 (for 7-statistic values based on data sets generated to contain 20 items and 500 simulees) to .99 (for 
LISREL8 cbi-squarc values associated with data sets simulated to contain 20 items and 250 simulees). The results 
from the logit-linear analysis for the T-statistic indicate that a model solely containing the dependent variable 
"decision accuracy" was sufficient to adequately account for the empirical Type I error rates, L\ 2) - 3.458. p~. 1 77. 
That is, empirical Type 1 error rates were not significantly affected by sample size. With respect to the LISREL8 
X J statistic, logit-linear analysis results indicate that a fully-saturated model, i.c.. including alt associations, is 
required to adequately account for the empirical Type I error rates, L\ 0) = 0.000, p“l 000. The proportion of 
incorrect rejections of unidimcnsionality for the 10-itein data sets dropped from .22 (250-simulee data sets) to . 10 
(500-simulce data sets) and finally, .09 (1000-simulcc data sets). On the other hand, empirical Type 1 error rates for 
the 20-item data sets dropped from .99 (250-simulee data sets) to .87 (500-simulee data sets) and finally. .49 (1000- 

simuicc data sets). 

Two-Dimensional conditions 

Poly-DIMTEST 7-statistic and LISREL8 x* statistic rejection rates of the assumption of unidimensionality for all 
two-dimensional simulated conditions are shown in Table 3. Again, due to computational limitations. T-statistic 
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calculations were restricted to 20-item data sets. Also, the same nominal Type 1 error probability was adopted 
(0.05). 



Insert Table 3 about here 



Rejection rates ranged from 3/100 (Poly-DIMTEST T-slatistic for data sets simulated to contain 20 items, 250 
simulees and an inter-proficiency correlation of .60) to 1 00/100 (LISREL8 statistic for the data sets generated to 
contain 20 items and 250 simulees. irrespective of inter-proficiency correlation, as well as 20-item, 1000 stmulcc 

data sets simulated to have zero correlation between proficiencies). 

The results from the Poly-DIMTEST T-statistic logit-linear analysis indicate that a fully-saturated model is 

needed to significantly account for the number of acceptances and rejections of the assumption of 
unidimcnsionality. L\ 4) ~ 5.105.^.277. With respect to the “sample size by proficiency correlation" intcractton, 
the proportions of rejections of the assumption of unidimcnsionality for 250-simulcc data sets, were equal to .27, .22 
and .03 for data sets where inter-proficiency correlation was respectively set at 0.00, 0.30 and 0.60. lor 500- 
simulec data sets, these proportions were equal to .28. .27 and . 16 for data sets simulated to respectively have intcr- 
proficicncy correlation values of 0.00, 0.30 and 0.60. Finally, with regard to 1000-simulec data sets, proportions of 
rejection rates of the assumption of unidimensionality dropped from .64 (r0,.0,=O.OO) to .44 (r<?„0,=O.3O), and 
finally .35 (r0,,0.= 0.60). Logit-linear analysis results for the L1SREL8 x 1 statistic show that a model including all 
main effects in addition to the "test length by sample size" and “sample size by proficiency correlation” interactions 
was needed to adequately account for the observed proportions of acceptances and rejections of the assumption of 
unidimcnsionality. L'(8) = 3.916,p=.865. Regarding the “test length by sample size” interaction, results show that 
the proportion of rejections of the assumption of unidimensionality increased for the 10-item data sets from 0.877 
(250-simulee data sets) to 0.950 (500-simulee data sets), and finally 0.977 (1000-simulee data sets). However, these 
rates tended to be more stable for 20-item data sets as evidenced by proportion of rejection rates equal to 1 .00. 0.977 
and 0.990 for 250, 500, and 1000-simulec data sets, respectively. For the “sample siz~ by proficiency correlation 
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interaction, the proportion of rejections of the assumption ofunidimensionality for the 250-simulec data sets 
dropped from 0.950 (rO.,0,* 0.00 and r0,0f 0 30) to 0.9 1 5 (r0„6r 0.60) whereas it varied from .975 (rfl,.fl,~0.00) 
to 0.955 (r0„0r 0-30) and finally, 0.960 {r0„0^ 0.60) for data sets simulated to contain 500 slmulecs. Finally, for 
1000-simulcc data sets, proportions of rejections dropped from 1 .000 to 0.985, and finally 0.965 when the degree of 
con-clntion between underlying proficiencies was respectively set at 0.00, 0.30 and 0.60. 

Discussion 

The re-emcrgcncc of performance assessments in education has enabled practitioners to measure types of 
behaviors not previously targeted by traditional means such as selected response items. The need for more 
“authentic" measures, however, docs not preclude rigorous psychometric analyses. The assessment of 
dimensionality is central to both classical and modem test theories. At the most basic icvel, the validity of a score- 
based inference (what Messick. 1989 refers to as the structural aspect of construct validity) rests upon our 
knowledge of the underlying dimensional structure of an item response matrix. The need to better understand the 
structure of our data is therefore of the utmost importance. The dearth of research dedicated to the assessment of 
dimensionality with polytomous data is of particular concern given the popularity of alternative forms of assessment 

in education at the present time. 

The findings obtained in the first pan of this study focused on the performance of the Poly-DIMTEST T- 
and LISREL8 chi-square statistics with unidimcnsional data sets. Severely inflated Type 1 error rates were obtain 
with the LISREL8 statistic in all conditions, with the exception of 10-item data sets simulated to contain 500 and 
1000 simulees. Poly-DIMTEST T- empirical Type 1 error probabilities were at or near nominal values for the three 
sample sizes examined. In fact, empirical alpha values were within two standard errors of the nominal value (.05) 
for all conditions examined. In addition, the performance of the latter statistic was unaffected by the manipulation 
of sample size. These results are similar to those reported by Li and Stout (1994) and Nandakumar, Yu, Li, and 
Stout (1995) in their simulations. On the other hand, the interaction of test length and sample size impacted upon 
Type 1 error rates of the LISREL8 y 2 statistic computations. 

u 
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It should be pointed out, however, that the fit statistic provided in the LISREL8 packages is chi-square 
distributed as; v? totically. It is quite likely that the empirical Type I error rates estimated with the LISREL8 chi- 
square fit statistic would adhere more closely to the nominal alpha level with sample sizes exceeding those that were 
simulated in the present investigation. In fact, the lower Type I error rate obtained with 10-item data sets containing 
500 and 1 000 examinees seems to support this point. However, these larger sample sizes might represent unrealistic 

testing situations many locally-based performance assessments. 

Not surprisingly given the inflated Type I error probabilities reported with 20-item data sets, rejection 
rates obtained using the L1SREL8 chi-square fit statistic were high across all simulated two-dimensional conditions. 
Results were encouraging, however, for 1 0-item data sets containing 500 or 1000 slmulees. The number of 
rejections of the assumption of unidimensionality in these conditions was equal to or greater than 90/100, 
irrespective of the degree of inter-proficiency correlation. Nonetheless, the logit-linear analysis results indicate that 
several factors impact upon the rejection rates obtained with the LISREL fit statistic. Poly-DIMTEST ^-statistic 
results show that the procedure lacks power in all conditions examined. Again, these findings are similar to those 
reported by Nandakumar, Yu. Li. and Stout (1995). Also, the proportions of rejection rates were affected by the 
interaction of sample size and degree of inter-proficiency correlation. 

Several tentative recommendations can be made based on the results obtained in the present study. First, it 
appears as though neither procedure works particularly well with samples containing less than 500 examinees. In 
those particular conditions, the onus should probably be placed on a sound test development process to ensure that 
the examination is targeting the intended constructs. Sireci and Geisinger (1995) have provided interesting 
applications of multidimensional scaling to ensure content domain representation. These types of analyses might 
also prove to be beneficial for assessing the structure of performance assessments administered to small samples. 

With 10-item data sets containing 500 or more examinees, results suggest that the LISREL8 x 1 statistic can 
be quite useful for the assessment of dimensionality. The Poly-DIMTEST T - statistic simply lacks the power needed 
in order to recommend its use with data sets that contain less than 20 items. However, based on prior findings, it is 
still advised to use the latter statistic with assessments that contain more than 20 Items and 1000 examinees. 
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Having said this, it is important to state that these findings should be interpreted in light of several caveats. 
First, the reported findings are highly dependent upon the conditions that were simulated and generalizations to 
other configurations should be undertaken cautiously, if at all. For example, the item parameters selected for the 
simulations obviously might not reflect all performance assessments. However, there is little reason to believe that 
these parameter estimates would differ from other clinical skills assessments. Second, it is important to re- 
emphasize that the purpose of this study was to examine the behavior of both statistics in several conditions that 
would hopefully allow us to gather practical information regarding both procedures. Obviously, additional 
simulations should be undertaken before making any definitive statements about the Type I and Type II error rates 
of both statistics. Finally, the results obtained with the LISREL8 chi-square and Poly-DIMTEST T-statistics were 
not unexpected. Inflated Type 1 error rates were reported in a study by De Champlain and Gessaroli (1997) that 
examined the behavior of the LISREL goodness-of-fit statistic with dichotomously-scored responses. Also, past 
research has clearly shown that the T-statistic does not function well with most of the conditions examined in our 
study. Nonetheless, given the popularity and usefulness of the DIMTEST package more generally, we kit it 
important to compare the performance of the LISREL8 chi-square statistic to that of the T-statistic. 

It is hoped that the results obtained in this study will provide valuable information regarding the behavior 
of two promising polytomous dimensionality assessment procedures in conditions that approximate those found 
with clinical skills performance assessments in medicine. It is also hoped that this investigation will help to foster 
future research in the area of dimensionality assessment in general. Finally, and more importantly, more attention 
needs to be geared towards the development and application of psychological and statistical models that aptly 
capture the complex multidimensional structure of performance assessments. 
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Table 1 

Polvtomous Item Response Parameters (PARSCALE) Used in Simulations 



Item 






Item Step Parameters 




a 


b, 


b } 


b, 


b t 


1 


0.134 


3.636 


4.803 


-1.403 


-7.037 


2 


0.177 


4.683 


-1.823 


1.010 


-3.869 


3 


0.154 


9.706 


-4.283 


-1.270 


-4.153 


4 


0.160 


3.675 


0.665 


-4.340 




5 


0.341 


1.746 


0.804 


-2.550 


— 


6 


0.257 


0.544 


2.171 


- 2.715 




7 


0.275 


3.289 


-0.675 


-2.614 




8 


0.242 


1.708 


1.350 


-3.059 




9 


0.598 


1.448 


0.215 


-1.663 


— 


10 


0.404 


• 1.089 


1.724 


-0.635 


— 
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" able 2 

Humber of Rejections of the Assumption of Unidimensionality per 1 00 Data Sets: Unidimensional Conditions 





10 items 


20 items 






N=250 N=500 N=1000 N=250 


N=500 


M=1000 


Poly-DIMTEST T-statistic 


' 9 


3 


5 


LISREL8 x 1 


22 10 9 99 


87 


51 



'Due to Poly-DIMTEST restrictions, it was not possible to compute f-statistic values for 10-item data sets. 
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Table 3 

Number of Rejections of the Assumption of Unidimensional ">• per 100 Data Sets. 



Two-Dimensional Conditions 



Procedure 


Test Length 


Sample Size 


Proficiency Correlation 


Number of Rejections 


Poly-DIMTEST 


10 items 


250 simulees 


r(G„0 a H>.OO 






10 items 


250 simulees 


K0„0 l )=O.3O 


— 




10 items 


250 simulees 


r(0|,G 2 )“O.6O 


— 




10 items 


500 simulees 


r(0|,0j)=O.OO 


— 




10 items 


500 simulees 


r(0„0 2 )=O.3O 


— 




10 items 


500 simulees 


r(0„e 2 )=O.6O 


— 




10 items 


1000 simulees 


r(0|,0i)“O.OO 


— 




10 items 


1000 simulees 


r(0„ e^O.30 


— 




10 items 


1000 simulees 


r(0„0 2 )=O.6O 


— 




20 items 


250 simulees 


r(0„0i)=O.OO 


27 




20 items 


250 simulees 


r(0,.e 2 )=O.3O 


22 




20 items 


250 simulees 


K®n®i)‘ = 0-60 


3 




20 items 


500 simulees 


r(0,.e 2 )=O.OO 


28 




20 items 


500 simulees 


rce,. 0j)=O.3O 


27 




20 items 


500 simulees 


r(0„0 2 )=O.6O 


16 




20 items 


1 000 simulees 


r(e„e 2 )=o.oo 


64 




20 items 


1 000 simulees 


r(0„0 2 )=O.3O 


44 




20 items 


1 000 simulees 


r<0„0 2 )=O.6O 


35 



t Due to Poly-DIMTEST restrictions, it was not possible to compute T- statistic values for 1 0-ltem data sets. 
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Table 3 (continued) 

Number of Rejections of the Assumption of Unidimensionality per 1 00 Data Sets: 
Two-Dimensional Conditions 



Procedure 


Test Length 


Sample Size 


Proficiency Correlation 


Number of Rejections 


LJSREL8 x l 


10 items 


250 simulees 


r(e„0,)=O.OO 


90 




10 items 


250 simulees 


r(9i,0i)“O.3O 


90 




10 items 


250 simulees 


r(0„0 J )=O.6O 


83 




10 items 


500 simulees 


r(0„0J=O.OO 


96 




10 items 


500 simulees 


r(6i.ej)-0.30 


95 




10 items 


500 simulees 


r(0„ej)=O.6O 


94 




10 items 


1 000 simulees 


r(0,,0j)“O.OO 


100 




10 items 


1000 simulees 


r(0,.0,)=O.3O 


99 




10 items 


1000 simulees 


r(0,;e2)=O.6O 


94 




20 items 


250 simulees 


r(0„e,)=O.OO 


100 




20 items 


250 simulees 


r(O,.0,)=O.3O 


100 




20 items 


250 simulees 


r(0„0,)«O.6O 


100 




20 items 


500 simulees 


r(0,.0j)=O.OO 


99 




20 items 


500 simulees 


r(0i,0j)=O.3O 


96 




20 items 


500 simulees 


r(0|,0i)=O.6O 


98 




20 items 


1 000 simulees 


r(0„e,)=o.oo 


100 




20 items 


1 000 simulees 


K0„e,)=o.3o 


98 




20 items 


1 000 simulees 




99 
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